# Image description generation
Kimi VL A3B Thinking 6bit
Other
Kimi-VL-A3B-Thinking-6bit is a multilingual vision-language model converted based on the MLX format, supporting image-text to text tasks.
Image-to-Text
Transformers Other

K
mlx-community
135
0
Pixtral 12b
Pixtral is a multimodal model based on the Mistral architecture that can handle image and text inputs and generate text outputs.
Image-to-Text
Transformers

P
saujasv
2,168
0
Pixtral 12b Nf4
Apache-2.0
A 4-bit quantized version based on the Mistral community's Pixtral-12B, focusing on image text-to-text tasks and supporting Chinese description generation.
Image-to-Text
Transformers

P
SeanScripts
236
20
Qwen2 Vl Tiny Random
This is a small debugging model randomly initialized based on the configuration of Qwen2-VL-7B-Instruct, used for vision-language tasks.
Image-to-Text
Transformers

Q
yujiepan
27
1
Blip Dalle3 Img2prompt
Fine - tuned based on the BLIP model, used to reverse - derive the possible prompt text used to generate an image from the image generated by DALL·E 3
Image-to-Text
Transformers Supports Multiple Languages

B
dblasko
98
36
Blip2 Opt 2.7b 8bit
MIT
BLIP-2 is a vision-language pre-trained model that combines an image encoder and a large language model for image-to-text generation tasks.
Image-to-Text
Transformers English

B
Mediocreatmybest
69
2
Featured Recommended AI Models